The COVID-19 pandemic has led to an unprecedented increase in economic inequality in the US. This broadening disparity underlies several social issues plaguing our country, and understanding it could hold the key to the well-being of many groups of people. To analyze trends in and formulate solutions to this inequality, we will be looking at a group of datasets containing information on wealth distribution in the US.
Throughout the past decade, economic inequality has intensified to levels never seen before. While there are a variety of potential contributing factors - technological change, globalization, and unfair governmental policy shifts, among many others - financial crises such as the one brought by the COVID-19 pandemic have served to accelerate the widening of the gap between the poor and the rich. This is cause for great concern; with large sections of society being left behind with very little economic opportunity and mobility, countrywide economic growth is negatively impacted, political polarization is deepened, and communal tensions are stoked.
It is thus imperative that socially conscious data scientists analyze the issue of economic inequality. One common way to do so is to collect data regarding distribution of wealth which measures the wealth of a nation including real estate, consumer durables, private businesses, and other financial assets that are distributed among the population. In our project, we look at datasets containing information on wealth distribution by generation, race, education, and income strata. We hope to analyze these datasets to obtain trends in economic inequality over time and identify potential actionable solutions.
Project Framing: There was no preparing for the COVID-19 pandemic. Closed businesses left individuals unemployed while also battling the everyday hardships of the pandemic such as illness, medical bills, and overall uncertainty. Wealth inequality in the United States has always been high, but with the shock created by covid, our group wanted to examine the effects of the pandemic on this distribution. By understanding how different groups of people were affected by the pandemic we are able to see if some people are systematically better or worse equipped for the pandemic. Finally, the better we understand the needs of individuals, the more likely we are to be able to implement effective policies to help them in times of need.
Human Values: At the core of this project is the value that community brings to a group. Communities provide individuals with a sense of belonging as well as a support system to help achieve individual goals. These goals come from the inalienable rights to life and liberty that all people have. In order to strengthen communities and empower individuals to achieve their goals, it is important to know which individuals may need additional support in order to gain equal opportunity. This sense of community is bolstered by the other core value of our project, equity. While inequality is incredibly difficult to completely snuff out, providing opportunities to those who need them forms the basis of equity and communal harmony.
Stakeholders: The primary stakeholders in this situation are the individuals that will be using this data. These would be individuals such as policy makers and non-government organizations who design policies and social programs in order to support those who can’t support themselves. Indirect stakeholders include the individuals who are unable to purchase everyday necessities due to hardship brought on by the pandemic. Other indirect stakeholders are individuals who have more than they might need as programs may target their money.
Benefits and Harm: If intervention is taken, the group that was least prepared for the COVID-19 pandemic would likely become much more equipped for future situations as the intervention would likely be a program designed to provide relief funds for these people. This would keep communities working at maximum efficiency even when not all people are able to work. Upper Class families would likely be at least moderately impacted by any intervention as wealth would be redistributed, or systematic advantages would be reduced. Similarly, they may choose to move their capital elsewhere to avoid regulation. Finally, the government and nation as a whole would likely benefit from increased equality among constituents as this would promote financial and economic stability.
Background sources
Economic inequality is at the root of many social issues. By looking into such inequality, we are able to examine people’s quality of life and predict determinants of the health and well-being of individuals and families. Looking at how economic well-being in the United States varies with social factors such as age/generation, race/ethnicity, income levels, and education will give us a glimpse into the stratification of American society and a basis to formulate potential solutions to this pressing problem. Some guiding questions that we will hope to answer are as follows:
How has the distribution of wealth in the United States changed over the last decade?
How does the distribution of wealth vary with an individual’s level of education? What are the education dynamics observed from the dataset?
How does the distribution of wealth vary with race? What are the racial or ethnic dynamics observed from the dataset?
How has the COVID-19 pandemic affected wealth and income distribution? How do these observed dynamics vary between the start (2019/2020) and the middle (2021) of the pandemic?
As a first look at some of these questions, we have made a few charts to help visualize the distribution of wealth across the various factors using data from each of our datasets.
The clustered bar chart shows the differences in the distribution of the data. Since the data for the chart contains one categorical and one numerical value, our group used a cluster bar chart to demonstrate the wealth distribution before and during the pandemic in different ethnicity to illustrate inequality. Besides, we decide to integrate interactive elements into the graph for better clarity. For example, when the user selects a section of the chart by left-clicking, it will automatically zoom into that section of the chart. After that, it will automatically re-scale itself by clicking the auto-scale button on the top right corner. The user can gain information when hovering over the bar. According to the chart, White Americans’ wealth has increased steadily before and during the pandemic. However, minorities’ wealth has remained mainly unchanged. Besides, White Americans’ wealth is at least six times more than any minorities’ wealth. Nonetheless, the chart provides evidence that correlates wealth inequality to minorities during the pandemic.
The pie charts show changes in wealth distribution by educational level from before and during the pandemic. Pie charts are simple but effective in displaying variables as percentage shares, and thus these charts provide valuable information about the concentration of wealth. The charts show an increase in percentage share for college educated individuals and a decrease in percentage share for non-high school educated students. The charts also show that college educated individuals control the vast majority of wealth (over 70%).
The line chart shows the change of wealth distribution from different percentile of income groups during and before the pandemic. Line chart makes it easy and clear to compare the different percentages of assets, also provides a great insight of the income gap between the groups. Further more, the graph shows that the highest income group has a income change that is 5 times greater thaan the lowest group. We could also tell that the lower classes experienced a greater impact by the pandemic since the distribution of wealth did not vary a lot for them compared to the higher percentiles.
The primary dataset, titled Distributional Financial Accounts (DFA) - Income Levels, consists of 14 variables/attributes (columns) and 517 observations (rows). The secondary datasets, DFA - Net Worth Levels, Race Levels, and Education Levels, are identical in dimensions and complexity. Each of these datasets contains the same variables but varies in observations based on the entries in the second column, “Category.” Thus, there is a potential to study the intersection of some of these categories through merging or mutating the datasets.
Here is a table showing a quick summary of information provided by the primary data set, DFA - Income Levels.
| V1 | |
|---|---|
| num_observations | 774.00 |
| num_variables | 14.00 |
| percent_inequity_2021 | 24.01 |
| percent_inequity_2018 | 21.09 |
| percent_inequity_1989 | 14.31 |
| diff_mean_median_2021_dollars | 12904157.33 |
| diff_mean_median_2018_dollars | 8670159.83 |
| diff_mean_median_1989_dollars | 895215.00 |
| ratio_of_liability_burden_2021 | 4.00 |
The first two entries show the number of of observations (rows) and variables (columns) in the dataset, respectively. The next three entries describe the difference in percentage of net worth distribution between the Top 1% and Bottom 20% of income earners in three key years: 2021 (during the pandemic), 2018 (before the pandemic), and 1989 (the earliest year for which data is available). These show the concentration of the total share of wealth increasing slightly between the pre-pandemic and pandemic eras, and a massive increase since 1989.
Entries 6, 7, and 8 show the difference between the mean and median net worth of all income groups within the same three key years as above. If the mean is significantly higher than the median, it shows a disproportionate concentration of wealth in the higher end of the income scale. The net worth of the 40-60% income group is used as a representative value for median net worth.
The last entry describes the ratio of liability burden, which we define as the ratio of liabilities to assets, between the top 1% and bottom 20% income earners. This indicates that the bottom 20% faces 4 times the liability burden as the top 1%.
Who or what is represented in the data?
The data paints a picture of the distribution of household wealth in the United States since 1989. It represents the household wealth statistics of 6500 families surveyed as part of the triennial Survey of Consumer Finances (and other families like them).
What is an observation? What variables are included?
Each observation contains data on net worth by category and fiscal quarter/year. Categories follow the titles of each of the datasets: the Networth Levels dataset contains categories “Top1,” “Next9,” “Next40,” and “Bottom50,” dividing households into groups based on percentiles of net worth. Similarly, the other datasets have categories dividing households into groups based on race, education levels, and income levels.
Each dataset has 14 variables:
Who collected the data? How was the data collection effort funded?
The data was collected by the National Opinion Research Center (NORC) at the University of Chicago, and is sponsored by the Federal Reserve Board in cooperation with the Department of the Treasury. Data from the SCF, is used by a multitude of organizations, from analysis by the Federal Reserve and news organizations to research by universities and economic research centers. More information the source of the data and the SCF can be found at this link.
How was the data validated and held secure? Is it credible and trustworthy?
The SCF website describes in detail their policies regarding data confidentiality and security. Per their website they utilize a multi-tiered approach to manage issues associated with computer and data security. Personally Identifiable Information (PII) is used only to contact survey respondents and is fully anonymized before providing the data to the SCF sponsor, the Federal Reserve Board. In their surveys, the NORC is in compliance with several federal regulations on information security and management, primarily the Federal Information Security Management Act (FISMA) and many others.
The NORC and Federal Reserve make concerted efforts to ensure that the study is representative of households from all economic strata. Households are randomly selected as per guidelines laid out in working papers on the Federal reserve website with the intention to represent the full range of households in the United States. As per the Federal Reserve, “to maintain the scientific validity of the study, interviewers are not allowed to substitute respondents for families that do not participate. Thus, if a family declines to participate, it means that families like theirs may not be represented clearly in national discussions.” Thus, the representativeness of the data is impacted by households that do not participate, but on the whole, there seem to be extensive measures taken to safeguard the credibility, security, and validity of the data.
How did you obtain the data?
Through a search for federal data on economic inequity, we came upon the Federal Reserve’s interactive data visualization titled “Distribution of Household Wealth in the U.S. since 1989” found at this link.
The visualization offered publicly downloadable data sets on each of the categories mentioned above.
For a brief glimpse at the Income data set, here is a table with several entries from the three key years we’ve identified above, broken down by quarter.
| Date | Category | Net.worth | Assets | Liabilities |
|---|---|---|---|---|
| 1989:Q3 | pct99to100 | 3506890 | 3649271 | 142381 |
| 1989:Q3 | pct80to99 | 8885419 | 10366937 | 1481518 |
| 1989:Q3 | pct60to80 | 3404719 | 4252640 | 847921 |
| 1989:Q3 | pct40to60 | 2505940 | 2931696 | 425755 |
| 1989:Q3 | pct20to40 | 1517273 | 1706702 | 189429 |
| 1989:Q3 | pct00to20 | 586690 | 650120 | 63430 |
| 1989:Q4 | pct99to100 | 3594228 | 3742691 | 148463 |
| 1989:Q4 | pct80to99 | 9089680 | 10591548 | 1501868 |
| 1989:Q4 | pct60to80 | 3469967 | 4338513 | 868546 |
| 1989:Q4 | pct40to60 | 2521558 | 2962965 | 441406 |
| 1989:Q4 | pct20to40 | 1523061 | 1722289 | 199228 |
| 1989:Q4 | pct00to20 | 593243 | 658812 | 65569 |
| 2018:Q1 | pct99to100 | 24359089 | 25535251 | 1176162 |
| 2018:Q1 | pct80to99 | 45462890 | 51742344 | 6279454 |
| 2018:Q1 | pct60to80 | 14444065 | 17847182 | 3403116 |
| 2018:Q1 | pct40to60 | 7495506 | 9495067 | 1999561 |
| 2018:Q1 | pct20to40 | 4139637 | 5302198 | 1162561 |
| 2018:Q1 | pct00to20 | 2540623 | 3187612 | 646989 |
| 2018:Q2 | pct99to100 | 24701330 | 25892076 | 1190746 |
| 2018:Q2 | pct80to99 | 45960520 | 52273088 | 6312567 |
| 2018:Q2 | pct60to80 | 14647876 | 18084169 | 3436293 |
| 2018:Q2 | pct40to60 | 7585712 | 9614280 | 2028568 |
| 2018:Q2 | pct20to40 | 4185822 | 5371946 | 1186123 |
| 2018:Q2 | pct00to20 | 2691857 | 3337163 | 645307 |
| 2018:Q3 | pct99to100 | 25324357 | 26527939 | 1203582 |
| 2018:Q3 | pct80to99 | 46513144 | 52877710 | 6364566 |
| 2018:Q3 | pct60to80 | 15065813 | 18549660 | 3483847 |
| 2018:Q3 | pct40to60 | 7690985 | 9755316 | 2064331 |
| 2018:Q3 | pct20to40 | 4271520 | 5482473 | 1210954 |
| 2018:Q3 | pct00to20 | 2818463 | 3463952 | 645489 |
| 2018:Q4 | pct99to100 | 23597451 | 24804284 | 1206833 |
| 2018:Q4 | pct80to99 | 45244893 | 51616549 | 6371656 |
| 2018:Q4 | pct60to80 | 14415887 | 17945704 | 3529817 |
| 2018:Q4 | pct40to60 | 7678196 | 9780885 | 2102689 |
| 2018:Q4 | pct20to40 | 4243286 | 5483937 | 1240651 |
| 2018:Q4 | pct00to20 | 2910422 | 3555114 | 644692 |
| 2021:Q1 | pct99to100 | 34161595 | 35586766 | 1425171 |
| 2021:Q1 | pct80to99 | 56771415 | 63780593 | 7009178 |
| 2021:Q1 | pct60to80 | 19952567 | 23653209 | 3700642 |
| 2021:Q1 | pct40to60 | 9357458 | 11595521 | 2238063 |
| 2021:Q1 | pct20to40 | 5163925 | 6511076 | 1347151 |
| 2021:Q1 | pct00to20 | 3406337 | 4098531 | 692194 |
| 2021:Q2 | pct99to100 | 36356825 | 37804671 | 1447845 |
| 2021:Q2 | pct80to99 | 58788449 | 65934906 | 7146457 |
| 2021:Q2 | pct60to80 | 20855077 | 24647834 | 3792756 |
| 2021:Q2 | pct40to60 | 9700928 | 11989989 | 2289061 |
| 2021:Q2 | pct20to40 | 5399293 | 6776392 | 1377099 |
| 2021:Q2 | pct00to20 | 3633057 | 4332045 | 698988 |
| 2021:Q3 | pct99to100 | 36806136 | 38261873 | 1455737 |
| 2021:Q3 | pct80to99 | 59587088 | 66845685 | 7258597 |
| 2021:Q3 | pct60to80 | 21068915 | 24953583 | 3884668 |
| 2021:Q3 | pct40to60 | 9911057 | 12252028 | 2340971 |
| 2021:Q3 | pct20to40 | 5585346 | 6988641 | 1403295 |
| 2021:Q3 | pct00to20 | 3932744 | 4646139 | 713395 |
Here are some aggregate data points, taking the mean for three variables - net worth, assets, and liabilities - for each income percentile group.
| Category | Net.worth | Assets | Liabilities |
|---|---|---|---|
| pct00to20 | 1739245 | 2147172 | 407926.5 |
| pct20to40 | 3070031 | 3815979 | 745947.6 |
| pct40to60 | 5066966 | 6468809 | 1401842.7 |
| pct60to80 | 9071412 | 11466366 | 2394953.6 |
| pct80to99 | 25668496 | 29972896 | 4304400.6 |
| pct99to100 | 12786803 | 13468352 | 681548.7 |
After analyzing the dataset and reflecting on the research questions, the data correlates between race and income level; we expect to see a generally increasing trend of wealth inequality across all categories over the last few years. Specifically, we expect to see that racial minorities have experienced more unfair wealth distribution during the COVID-19 pandemic. We hope to explore some of the intersectional and cross-cutting trends of inequality through our analysis, which will inform our efforts to formulate potential solutions. Based on our current assessment of the sociopolitical landscape in the US, we have identified some possible avenues for economic inequality reduction that might warrant consideration.
Primarily, the rapid, pervasive digitization of the world economy induced by the pandemic has necessitated high-speed internet and computers for nearly all learning and working environments. However, access to such technology is dictated to a great extent by economic well-being, and is very limited for people in lower wealth strata. Therefore, technologists have the prime opportunity to improve technological equity through widening high-speed internet reach and programs such as renting out affordable computers near low-income areas. Educators, designers, and technologists can collaboratively utilize this broadened network to design and provide educational content to enable individuals from lower socioeconomic backgrounds to learn new skills, thus improving social mobility. Policymakers can use wealth distribution data to inform their processes of developing social programs such as such as subsidies toward affordable technology, rent and eviction moratoriums, public transportation systems, and unemployment assistance. Importantly, analyses of wealth distribution would give policymakers an important reference point when redesigning taxation systems in the US.
Although the graphs provided by the Federal Reserve at the above-mentioned website are a practical tool for data analysis and inferring conclusions, there are some limitations to consider when examining the data. Firstly, the website mentions the sample size for the data is 6500 families, but the population for the United States is 329.5 million. While these families have been chosen to form a cross-section of American society as a whole, our team cannot know to what extent the sample truly represents the population. It is crucial when analyzing the data because they reduce our biases. Besides, the graph does not deliberately mention income inequality during the pandemic and the factors that influence the situation. Our team has to infer from the chart by comparing different years and reading several articles to conclude the various contributing factors to economic inequality. There is always the possibility that our team might unknowingly include our biases and prior knowledge when interpreting the data, thus unaware of other possible factors and confounding variables that have affected the data. Nonetheless, the datasets are an effective tool to understand the income and wealth distribution in the US.
Batty, M., Deeken, E., & Volz, A. H. (2021, August 30). Wealth inequality and covid-19: Evidence from the distributional financial accounts. The Fed - Wealth Inequality and COVID-19: Evidence from the Distributional Financial Accounts. Retrieved February 6, 2022, from https://www.federalreserve.gov/econres/notes/feds-notes/wealth-inequality-and-covid-19-evidence-from-the-distributional-financial-accounts-20210830.htm
Board of governors of the Federal Reserve System. (2017, March 16). Retrieved February 6, 2022, from https://www.federalreserve.gov/econres/aboutscf.htm
Board of governors of the Federal Reserve System. (2021, February 23). Retrieved February 6, 2022, from https://www.federalreserve.gov/econres/scf_workingpapers.htm
Board of governors of the Federal Reserve System. The Fed - Distribution: Distribution of Household Wealth in the U.S. since 1989. (2021, December 17). Retrieved February 6, 2022, from https://www.federalreserve.gov/releases/z1/dataviz/dfa/distribute/chart/#quarter:128;series:Net%20worth;demographic:networth;population:all;units:levels;range:2006.3,2021.3
Buckley, P., Barua, A., & Samaddar, M. (2021, November 10). US income inequality after the pandemic. Deloitte Insights. Retrieved February 6, 2022, from https://www2.deloitte.com/us/en/insights/economy/issues-by-the-numbers/covid-impact-on-income-inequality.html
NORC at the University of Chicago. (n.d.). Data security. SCF. Retrieved February 6, 2022, from https://scf.norc.org/DataSecurityandPrivacy.html
Tracking the COVID-19 economy’s effects on food, housing, and employment hardships. Center on Budget and Policy Priorities. (2021, November 10). Retrieved February 6, 2022, from https://www.cbpp.org/research/poverty-and-inequality/tracking-the-covid-19-economys-effects-on-food-housing-and